#read in data
dat<-extract_tables("HW5/Long-term_Monitoring_Cape_May_NJ.pdf", pages = 8)

#create tibble
dat2 <- (as_tibble(dat[[1]])
       %>% row_to_names(row_number = 1, remove_row = TRUE) #fix column names,set them to first row
       %>% rename(Day = 1) #name first column
       %>% select(!(Avg)) #remove daily avg data
       %>% filter(Day != "Avg") #remove yearly avg data
       %>% pivot_longer(cols = -Day, names_to = "Year", values_to = "Mean",
                        values_transform = list(Mean = as.integer))
       %>% drop_na(Mean) # remove NAs
       %>% mutate(Year = factor(year(ymd(Year, truncated = 2)))) #convert to 4 digit year
       #use one year for all month_day data, because I want to be able to plot all the same
       #month_day data on the same x-coordinate regardless of year. What is the better way 
       #to do this?
       %>% mutate(Day_Month = dmy(paste(Day, 2000, sep = " ")))
       %>% arrange(Day_Month, Year)
       %>% group_by(Year)
       %>% mutate(MA_2=zoo::rollmean(Mean, k=2, fill=NA)) #compute 2-day moving average/year
       %>% mutate(MA_3=zoo::rollmean(Mean, k=3, fill=NA)) #compute 3-day moving average/year
       %>% ungroup()

)

#create plot
gg <- (ggplot(dat2, aes(x=Day_Month, y=Mean, colour = Year))
       + geom_point(alpha=0.4)#how do I set alpha=1 in ggplotly legend
       + geom_line(aes(y=MA_2))
       + geom_line(aes(y=MA_3), lty=6)
       # how do I add linetype aesthetic to ggplotly legend
       + geom_text(aes(x=as.Date("2000-10-17"), 
                       y=900, label="2-day Moving Average (solid line)"), color="#482173FF")
       + geom_text(aes(x=as.Date("2000-10-18"), 
                       y=800, label="3-day Moving Average (dashed line)"), color="#482173FF")
       + scale_y_sqrt() #scale y, and keep y=0
       + scale_x_date(labels=date_format("%d-%b"), date_breaks = "1 week")
       + theme_bw()
       + xlab("")
       + ylab("Mean Number of Monarchs")
       + ggtitle("Monarch Butterflies in Cape May, NJ")
       + scale_colour_viridis_d()

)



(ggplotly(gg)
 #  %>% layout(legend=list(aes(alpha=1)))
)

The data set contains mean number of monarch butterflies per year in Cape May, NJ on each day from September 1st to October 31st through 1992-2004. There were 10 records with missing data, and these were removed. The intention of the plot was to show both the cyclical changes per season, and the yearly changes. Originally an animated plot by year was used, but it was difficult to compare the year by year changes. This interactive plot was chosen so that a more direct comparison could be made about cyclical and yearly differences and because it would be difficult to view all data at once in a non-iteractive sense. Ideally, it would be better if the default year selection for this graph showed only two years worth of data, as viewing all 13 years worth of data at once is not readable.

It was noted in the paper that there were some cyclical trends at 2 or 3 days, so moving averages were used to determine if these trends were visible and would line up across years. The data points were chosen to be subtle but visible so the moving average lines would stand-out. It is unfortunate that the alpha aesthetic from the data points carries over to the legend because it makes the legend too faint. The y-axis was scaled with a square root transformation to view the smaller values of the data while keeping the zero counts. The tooltip needs to be modified to accurately reflect the day-month values without the year.